Analytical Visualization on the History of Himalyan Expeditions
INFO 526 - Summer 2024 - Final Project
Abstract
The Himalayan Database, a meticulous archive originating from the pioneering work of Elizabeth Hawley, provides an invaluable resource for understanding the history of mountaineering in the Nepal Himalaya. This project leverages a subset of this comprehensive database, specifically focusing on expedition data recorded between 2020 and 2024, to analyze key trends and influential factors in contemporary Himalayan climbing. Utilizing two tidy tibbles, ‘peaks’ and ‘expeditions,’ we investigate patterns related to seasonality, success rates, and national participation within this recent timeframe. Furthermore, the analysis delves into the relationships between critical expedition choices, such as selected routes and agency affiliations, and their impact on expedition outcomes, including success probabilities and fatality risks. By examining these variables across diverse nationalities and temporal periods within this focused dataset, this study aims to contribute a deeper, data-driven understanding of the multifaceted elements influencing mountaineering endeavors in the challenging Himalayan environment.
Introduction
Mountaineering in the Nepal Himalaya represents one of humanity’s most profound engagements with extreme natural environments, characterized by unparalleled challenges and breathtaking achievements. Understanding the dynamics and outcomes of these expeditions is crucial for both historical context and future endeavors. This project embarks on an exploratory analysis of a specific segment of the rich historical data encapsulated within The Himalayan Database, an enduring legacy of Elizabeth Hawley’s dedicated efforts to document every facet of Himalayan climbing history. Originally compiled from a vast array of sources and made freely available online since 2017, the full database serves as a cornerstone for mountaineering research.
Our study specifically focuses on intriguing patterns and insights derived from mountaineering expeditions undertaken in the Nepal Himalaya during the years 2020 to 2024. By analyzing this extensive, yet focused, dataset of Himalayan climbs (structured into ‘peaks’ and ‘expeditions’ tibbles) this project seeks to uncover significant relationships between the strategic choices climbers make, such as their selected routes and expedition agencies, and their chances of achieving success or facing the tragic risk of fatalities. The analysis particularly aims to shed light on how these critical factors vary across different nations and evolving time periods within this contemporary five-year window. Through this focused exploration of a recent subset of The Himalayan Database, we aspire to offer a deeper, data-driven understanding of what influences expedition outcomes in one of the world’s most challenging and captivating mountaineering environments.
Question 1
Are Certain Routes Favored by Expeditions from Particular Nations, and Do They Have Disparate Success Rates?
Introduction
This section initiates our investigation into the strategic choices made by mountaineering expeditions in the Nepal Himalaya and their correlation with expedition outcomes. Specifically, we aim to uncover whether the selection of particular climbing routes is influenced by the nationality of the expedition team. Furthermore, we will explore if these nationally-favored routes demonstrate significantly different success rates, potentially highlighting variations in national climbing philosophies, accumulated experience on specific routes, or inherent disparities in route difficulty. A crucial aspect of our analysis involves truncating low-volume attempts. This methodological decision is made to avoid the undue influence of statistical outliers that could arise from a small number of expeditions on a given route, ensuring that our insights are derived from more statistically robust patterns.
Approach
Our analytical approach to address this question is structured into three main phases:
1. Peak-Specific Success Rate Calibration: To establish a foundational understanding, we first calibrated the general success rates for the top four most frequently attempted peaks within our 2020-2024 expeditions dataset. This step provides a broad overview of expedition success for these prominent summits, setting a comparative context for the more granular route-specific analysis.
2. Visualization of Route Success by Nation: Following the general calibration, we proceeded to visualize the success percentages of popular chosen routes for each of these four peaks, broken down by the participating nations. A bubble chart was employed for this visualization. In these charts, the size of each bubble represents the volume of attempts by a specific nation on a particular route, while its position indicates the corresponding success percentage. This allows for a clear, intuitive representation of both the popularity and efficacy of routes across different national teams.
3. Integrated Visualization and Interpretation: To facilitate a comprehensive comparative analysis and derive overarching interpretations, the individual bubble charts for all four peaks were combined into a single, integrated visualization. This unified view enables a direct comparison of route preferences and success rate disparities across multiple popular peaks and diverse national expedition teams, offering deeper insights into the interplay of national origin, route choice, and expedition success in the challenging Himalayan environment.
Analysis
general_census <- ggplot(summary_data, aes(x = reorder(pkname, -attempts), y = success_rate, fill = attempts)) +
geom_col() +
geom_text(aes(label = success_rate_label),
vjust = -0.5, # Position above bars
color = "black") +
scale_fill_viridis_c(
option = "viridis",
name = "Number of Attempts"
) +
coord_cartesian(ylim = c(0, 100)) + # Success rate is in percent (0-100)
labs(
x = "Peaks",
y = "Success Rate (%)",
title = "Success Rate of Top 4 Peaks Attempted by all Nations",
subtitle = "Bar height indicates success rate; color indicates attempts",
caption = "Source: https://github.com/rfordatascience/tidytuesday"
) +
theme_minimal(base_size = 14)p1 <- ggplot(ever_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = "identity") +
geom_text_repel(data = subset(ever_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.7,
point.padding = 0.6,
min.segment.length = Inf) +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none")+
annotate("text", y = 125, x = 0.7, label = "Everest", size = 5, fontface = "bold") +
labs(x = NULL,
y = "Success Rate (%)") +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 125)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p2 <- ggplot(amad_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = position_jitter(width = 0)) +
geom_text_repel(data = subset(amad_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf) +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none") +
annotate("text", y = 120, x = 0.9, label = "Ama Dablam", size = 5, fontface = "bold") +
labs(x = NULL,
y = NULL) +
scale_y_continuous(breaks = seq(0, 100, by = 25)) + # Adjusted seq start to 0 for clarity
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 120)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p3 <- ggplot(lhot_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.6,
position = "identity") +
geom_text_repel(data = subset(lhot_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf,
position = "identity") +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none")+
annotate("text", y = 130, x = 0.6, label = "Lhotse", size = 5, fontface = "bold") +
labs(x = "Route",
y = "Success Rate (%)") +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 130)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p4 <- ggplot(mana_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = "identity") +
geom_text_repel(data = subset(mana_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf,
position = "identity") +
scale_color_viridis_c(option = "turbo",
name = "\n",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = guide_colorbar(direction = "horizontal", title.position = "top")) +
scale_size_continuous(range = c(3, 15),
name = " Attempts on route metrix (size + color)",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = guide_legend(title.position = "top"))+
annotate("text", y = 120, x = 0.55, label = "Manaslu", size = 5, fontface = "bold") +
labs (x = "Route",
y = NULL) +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 120)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)combined_plot <- (p1 + p2) / (p3 + p4) +
plot_layout(guides = "collect") +
plot_annotation(
title = "National Route Preferences and Success Rates\nin High-Altitude Peak Expeditions (2020-2024)",
subtitle = "Bubble Size and Color Show Attempts, with Success Rates\nfor Top 3 Nations Across Four Most Popular Peaks",
caption = "Source: https://github.com/rfordatascience/tidytuesday",
theme = theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.2),
plot.subtitle = element_text(size = 14, hjust = 0.2),
plot.caption = element_text(size = 14)
)
) &
theme(
legend.position = "bottom",
legend.box = "horizontal",
legend.title = element_text(size = 14, hjust = 0.5)
)Visualization
Alt text: Bar chart titled “Success Rate of Top 4 Peaks Attempted by all Nations (2020-2024)” showing success rates for Everest (88.9%), Ama Dablam (93.2%), Lhotse (88.9%), and Manaslu (52.6%). Bar height indicates success rate, with colors ranging from yellow (180 attempts) to dark purple (100 attempts) representing the number of attempts. Source: https://github.com/rfordatascience/tidytuesday.
Alt text: Bubble chart titled “National Route Preferences and Success Rates in High-Altitude Peak Expeditions (2020-2024)” showing success rates for top 3 nations across four popular peaks: Everest, Ama Dablam, Lhotse, and Manaslu. Bubbles represent attempts, with size and color indicating the number of attempts (1 to 40), and position showing success rates (0-100%). Routes include N Col-NE Ridge, N Face (Hornbein Couloir), S Col-SE Ridge for Everest; SW Ridge, W Face for Ama Dablam; S Col-W Face, W Face for Lhotse; and NE Face for Manaslu. Nations include USA, China, Nepal, India, and UK. Source: https://github.com/rfordatascience/tidytuesday.
Observation
The combined bubble plots for Everest, Lhotse, Ama Dablam, and Manaslu reveal clear patterns in national route preferences and their corresponding success rates:
Dominant Route Concentration: A significant majority of expedition attempts across these popular peaks are concentrated on a single, well-established route. For instance, the “S Col-SE Ridge” on Everest and the “W Face” on Lhotse are overwhelmingly favored, indicated by large bubbles representing high attempt volumes from nations like the USA, China, and Nepal. This suggests the existence of a ‘standard’ or ‘commercial’ route for each peak.
Marginality of Alternate Routes: While other routes were attempted, their popularity and often their success rates were considerably lower. Smaller bubbles and, at times, lower success percentages for alternative paths (e.g., Everest’s “N Col-NE Ridge”) highlight a strong collective preference for the primary, more established route, with alternatives attracting fewer expeditions.
Strategic Path-Peak Selection: The data strongly indicates that route selection is a critical determinant of expedition success, particularly for top-performing nations. Countries like the USA, China, Nepal, and India consistently favor specific path-peak combinations that demonstrate high success rates. This strategic alignment underscores a pragmatic approach where route choice is a calculated decision to optimize success probabilities.
Confirmation of Inherent Route Bias: The observed patterns in the 2020-2024 dataset reinforce a historical trend where specific routes have emerged as the most reliable. The high success rates for concentrated attempts on routes like Everest’s “S Col-SE Ridge” reflect a continuing bias towards ‘proven’ paths, likely due to well-documented passages, established infrastructure, and accumulated experience, which collectively contribute to a higher probability of success.
Question 2
Do certain Agencies have a higher number of member/personal deaths than others with respect to season/date?
Introduction
This question seeks to look at the correlation between agencies and fatality rate and if there are any important trends, especially with respect to season and date. The findings of this question are interesting because it would raise a lot of further questions regarding why these correlations are seen, if there are specific safety policies or certain weather patterns that could lead to some agencies having a higher fatality rate than others, to name a few.
Approach
For this question, we first created variables for both the total death per expedition, as well as percentage death per expedition for the member or customer, hired staff, and the total percentage. Then, we started with a general graph that looks at Agencies and how many deadly expeditions they had total between 2021 and 2024. Then we zoom in per year and look at the percentage as well as total death by Agency (and by season). Lastly, we create a graph looking at the total actual deaths (raw totals) for each agency in case there are any differences conclusions we can draw with the percentage graph.
Initially, we sought to add all the percentage and raw graphs into one large graph, faceted by year, but quickly realized that this would be unfeasible to do so cleanly as all the y values (or Agencies) would change depending on the year and depending on which had more or less fatality. Thus, we chose to have them as side by side plots on two tabs to easily swap between.
Analysis
agency_by_fatal <- exped_tidy_deadly |>
#graphs agency by number of fatal expeditions. fct_infreq was debugged consulting with AI after looking at documentation. Same with after_stat(count)
ggplot(aes(x = fct_rev(fct_infreq(agency)), fill = after_stat(count))) +
geom_bar() +
#flips coordinates for better readbility of agencies
coord_flip() +
#Increased number of breaks
scale_y_continuous(breaks = c(0, 2, 4, 6, 8, 10)) +
#colored where red is more deadly. I wanted a low intensity representing an increasing intensity so I settled on yellow
scale_fill_gradient(low = "#ffce00", high = "darkred") +
labs(
title = "Number of expeditions through the Himalayas \nthat resulted in death by Agency",
subtitle = "from 2021 - 2024",
caption = "Source: Tidytuesday",
x = NULL,
y = "Number of expeditions that resulted in at least one death",
fill = NULL
) +
theme_minimal() +
#got rid of grid to improve readability
theme(
legend.position = "none",
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank()
)percent_2021 <- deaths_2021_av |>
#graph by descending percent total deaths by agency. Color is for season
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
#set coordinates for better comparison between groups
coord_flip(ylim = c(0, 1)) +
#colored based on majority season color association
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Percent total deaths by Agency in 2021",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
#rename x values to have percent
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
#individual labels for each individual percent death rather than average and distingish it between member and hired staff
annotate("text", y = 0.2, x = 1, label = "M 6.6%") +
annotate("text", y = 0.2, x = 2, label = "H 3.3%") +
annotate("text", y = 0.3, x = 3, label = "Trek 1: M 54%") +
annotate("text", y = 0.7, x = 3, label = "Trek 2: H 10%") +
annotate("text", y = 0.3, x = 4, label = "M 20%") +
annotate("text", y = 0.4, x = 5, label = "M 75%") +
theme_minimal() +
#cleaned up grid for better readability of annotations
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)tota_2021 <- deaths_2021_raw |>
#same as above, except with total deaths and not percents
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
#sets coordinates for easier comparison between groups
coord_flip(ylim = c(0, 5)) +
#colored same as above for easy comparison
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Deaths",
title = "Percent total deaths by Agency in 2021",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
#set breaks
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
#instead annotate with number of treks because most deaths were 1 or 2 total so the prior labels are somewhat superfulous and messy
annotate("text", y = 4, x = 5, label = "Total Treks: 2") +
theme_minimal() +
#clean grid for better readability
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2022 <- deaths_2022_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2022",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 1, label = "H 20%") +
annotate("text", y = 0.2, x = 2, label = "H 4.7%") +
annotate("text", y = 0.2, x = 3, label = "M 7.6%") +
annotate("text", y = 0.3, x = 4, label = "Trek 1: M 6.25%") +
annotate("text", y = 0.7, x = 4, label = "Trek 2: H 10%") +
annotate("text", y = 0.32, x = 4.8, label = "Trek 1: M 14.2%") +
annotate("text", y = 0.32, x = 5.2, label = "Trek 2: H 6.25%") +
annotate("text", y = 0.8, x = 5, label = "Trek 3: H 6.67%") +
annotate("text", y = 0.25, x = 6, label = "M 20%") +
annotate("text", y = 0.3, x = 7, label = "M 14.28%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2022 <- deaths_2022_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2022",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 4, x = 6, label = "Total Treks: 3") +
annotate("text", y = 3, x = 7, label = "Total Treks: 2") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2023 <- deaths_2023_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2023",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 1, label = "M 5.8%") +
annotate("text", y = 0.2, x = 2, label = "Trek 1: M 4.7%") +
annotate("text", y = 0.5, x = 2, label = "Trek 2: M 6.7%") +
annotate("text", y = 0.8, x = 2, label = "Trek 3: 6.3%") +
annotate("text", y = 0.2, x = 3, label = "M 10%") +
annotate("text", y = 0.2, x = 4, label = "M 1.66%") +
annotate("text", y = 0.2, x = 5, label = "M 15.3%") +
annotate("text", y = 0.2, x = 6, label = "H 10%") +
annotate("text", y = 0.2, x = 7, label = "M 13.3%") +
annotate("text", y = 0.2, x = 8, label = "M 20%") +
annotate("text", y = 0.23, x = 9, label = "H 25%") +
annotate("text", y = 0.25, x = 10, label = "M 16.7%") +
annotate("text", y = 0.5, x = 11, label = "Trek 1: M 5.3%") +
annotate("text", y = 0.8, x = 11, label = "Trek 2: M 33.3%") +
annotate("text", y = 0.5, x = 12, label = "M 100%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2023 <- deaths_2023_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Total deaths by Agency in 2023",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 3, x = 6, label = "Total Treks: 2") +
annotate("text", y = 3, x = 12, label = "Total Treks: 3") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2024 <- deaths_2024_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = "lightgreen") +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2024",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 2, label = "M 6.7%") +
annotate("text", y = 0.3, x = 1, label = "Trek 1: M 4.5% H 3.3%") +
annotate("text", y = 0.7, x = 1, label = "Trek 2: H 3.2%") +
annotate("text", y = 0.2, x = 3, label = "M 10%") +
annotate("text", y = 0.2, x = 4, label = "M 11.1% H 2%") +
annotate("text", y = 0.5, x = 5, label = "M 50%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2024 <- deaths_2024_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Total deaths by Agency in 2024",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 4, x = 4, label = "Total Treks: 2") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)Visualization
# A tibble: 10 × 69
expid peakid year season season_factor host host_factor
<chr> <chr> <dbl> <dbl> <chr> <dbl> <chr>
1 EVER20101 EVER 2020 1 Spring 2 China
2 EVER20102 EVER 2020 1 Spring 2 China
3 EVER20103 EVER 2020 1 Spring 2 China
4 AMAD20301 AMAD 2020 3 Autumn 1 Nepal
5 AMAD20302 AMAD 2020 3 Autumn 1 Nepal
6 AMAD20303 AMAD 2020 3 Autumn 1 Nepal
7 AMAD20304 AMAD 2020 3 Autumn 1 Nepal
8 AMAD20305 AMAD 2020 3 Autumn 1 Nepal
9 AMAD20306 AMAD 2020 3 Autumn 1 Nepal
10 AMAD20307 AMAD 2020 3 Autumn 1 Nepal
# ℹ 62 more variables: route1 <chr>, route2 <chr>, route3 <lgl>,
# route4 <lgl>, nation <chr>, leaders <chr>, sponsor <chr>,
# success1 <lgl>, success2 <lgl>, success3 <lgl>,
# success4 <lgl>, ascent1 <chr>, ascent2 <chr>, ascent3 <lgl>,
# ascent4 <lgl>, claimed <lgl>, disputed <lgl>,
# countries <chr>, approach <chr>, bcdate <date>,
# smtdate <date>, smttime <chr>, smtdays <dbl>, …
# A tibble: 37 × 14
year season_factor host_factor nation agency totmembers
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Spring Nepal USA TAGnepal Tr… 35
2 2021 Spring Nepal Russia 7 Summits A… 15
3 2021 Spring Nepal India Seven Summi… 37
4 2021 Spring Nepal China Seven Summi… 13
5 2021 Autumn Nepal Nepal TAGnepal Tr… 5
6 2021 Autumn Nepal France Pralhad Cha… 4
7 2022 Spring Nepal Greece Seven Summi… 7
8 2022 Spring Nepal USA Beyul Adven… 10
9 2022 Spring Nepal Russia 7 Summits A… 15
10 2022 Spring Nepal Nepal High Five A… 7
# ℹ 27 more rows
# ℹ 8 more variables: smtmembers <dbl>, mdeaths <dbl>,
# tothired <dbl>, hdeaths <dbl>, totdeaths <dbl>,
# pmdeaths <dbl>, phdeaths <dbl>, ptotdeaths <dbl>
Alt Text: Bar graph showing the number of expeditions between 2021 and 2024 with at least one fatality for each Agency, sourced from Tidy-Tuesday. The Agency values are as follows: Seven Summit Trekks 10, 7 Summits Adventure 2, 8K Expeditions 2, Beyul Adventure 2, Himalayan Guides 2, Pioneer Adventure 2, Satori Adventures 2, 14 Summits 1, Annapurna Treks 1, Asian Trekking 1, Expedition Himalaya 1, Glacier Himalaya Treks, High Five Adventures 1, Image Nepal 1, Makalu Adventure 1, Peak Promotion 1, Pralhad Chapagin 1, Shangri-La Nepal Treks 1, Snowy Horizon Treks 1, TAGnepal Trekking 1, TAGnepal Trekking (Snowy Horison pmt) 1, and Yeti Adventure 1. The graph clearly shows that Seven Summit Treks is has significantly more than the rest.
Alt ID: The bar graph shows the percent total death by each Agency that had at least one expedition with fatality in 2021 as well as which season each trek is in. Pralhad Chapagain has about 80% fatality in Autumn, TAGnepal Trekking (Snowy) has about 20% fatality in Autumn, with Seven Summit Treks, 7 Summits Adventure, and TAGnepal Trekking less than 10 percent each and in Spring. The graph is attempting to show that not only does Pralhad Chapagin have the highest percent death but Seven Summit Treks has a much lower percent death than anticipated.
Alt ID: The bar graph shows the total death by Agency in 2021 with Seven Summit Treks having 3 deaths per 2 treks in Spring, Pralhad Chapagain having 3 deaths in Autumn, TAGnepal Trekking (Snowy) with one death in Autumn, and both TAGnepal Trekking and 7 Summit Adventure with one death in spring. The graph is trying to show that Seven Summit Treks and Pralhad Chapagain have the same number of actual deaths that occured in 2021.
Alt ID: The bar graph shows the percent total death by each Agency that had at least one expedition with fatality in 2022 as well as which season each trek is in. The important agencies of note are High Five Adventures with around 14% in spring, Shangri-La Nepal Treks with around 12% in Autumn, and Seven Summit Treks with three treks but around 10% in total between autumn and Spring.
Observations
- Firstly, the agency, Seven Summits Treks, has the most fatal expeditions from 2021 - 2024, with 10 expeditions with at least one fatality, which is significantly higher than the other agencies which only have one or two fatal expeditions.
Unexpectedly, when we looked at the percentage graphs by year, we can see that Seven Summits Treks often has a low or medium percentage fatality.
The actual deaths were relatively similar across the board, with usually only 1 or 2 people dying per expedition.
This likely means that Seven Summit Treks has a larger group per expedition (30 + people) than other agencies so that even though they have relatively similar total deaths per expedition, their percent fatality is much lower than the others with a small group (4 to 5 people)
- However, we cannot conclude that the higher number of fatal expeditions is due to having more expeditions per year because we did not look at all expeditions, only the deadly ones.
- Spring has more fatal expeditions than Autumn from 2021 - 2024.
There is not enough data to conclude whether or not Autumn has a higher fatal percent rate than Spring when an expedition has at least one fatality. (i.e. spring has more fatal expeditions but only one person dies per expedition vs Autumn having less but more people die)
We also cannot conclude whether or not this difference is due to an increased or decreased number of expeditions in each season.
- 2022 and 2023 had more fatal expeditions and more agencies with fatal expeditions than for 2021 and 2024.
- Although there is slightly more member or customer deaths than hired staff deaths from 2021 to 2024 per expedition, but the data is too close to be able to reject the null hypothesis.
Conclusion, Limitations, and Future Directions
Conclusion
Our analysis of 2020-2024 Himalayan expeditions reveals that the overwhelmingly dominant and historically proven routes are consistently preferred and yield the highest success rates. Top-performing nations strategically prioritize these established paths, underscoring route selection as a critical determinant of expedition success. Conversely, alternative routes demonstrate notably lower effectiveness.
Seven Summits Treks has the most fatal expeditions with at least one death irregardless of member or staff death. Spring has more deadly expeditions than Autumn. 2022 and 2023 have more deadly expeditions than 2021 and 2024. There is seemingly slightly more member or customer death and hired staff but the data is inconclusive. Overall, however, although we were able to see certain agencies that had a higher number of expeditions with at least one fatality, this data is not conclusively correlate more Himalayan expedition fatalities to one or a few specific agencies that are more egregious than the rest.
Key Limitations
Our analysis faced two primary limitations:
Data Quality: The initial dataset required significant pre-processing due to inconsistencies, particularly with consolidated route information, impacting granular analysis.
Data Sparsity: A high variable count relative to the number of entries in our filtered dataset limited the ability to draw universally conclusive findings and complex statistical relationships.
For question 2, the data could benefit from a wider reaching question that took in account comparisons between non-fatal and fatal expeditions. Additionally, raw death total graphs were included because certain variables seemed to have high percent death, but was due to an overcompensation of percentage calculations for small groups. Lastly, patchwork was unable to join the graphs which meant that each graph took up more space.
Future Directions
Building on these insights, future research should focus on:
In-depth Dataset Exploration: Leveraging the comprehensive original Himalayan Database (https://www.himalayandatabase.com/hbn2019.html) to conduct more robust analyses and explore broader historical trends.
Expanded Variable Analysis: Investigating a wider range of factors, such as leadership roles, team sizes, and seasonal influences. Additionally, it should take into account group number and non-fatal expeditions as well as other potential correlating variables.
Predictive Modeling: Developing models to forecast expedition success based on various input factors, aiding future planning and risk management.